Today we will…
ggplot2)Artwork by Allison Horst
Different formats of the data are tidy in different ways.
Artwork by Allison Horst
Look at the file extension for the type of data file.
.csv : “comma-separated values”
Name, Age
Bob, 49
Joe, 40
.xls, .xlsx: Microsoft Excel spreadsheet
.csvreadxl package.txt: plain text
Using base R functions:
read.csv() is for reading in .csv files.
read.table() and read.delim() are for any data with “columns” (you specify the separator).
The tidyverse has some cleaned-up versions in the readr and readxl packages:
read_csv() is for comma-separated data.
read_tsv() is for tab-separated data.
read_table() is for white-space-separated data.
read_delim() is any data with “columns” (you specify the separator). The above are special cases.
read_excel() is specifically for dealing with Excel files.
Remember to load the readr and readxl packages first!
The Grammar of Graphics (GoG) is a principled way of specifying exactly how to create a particular graph from a given data set. It helps us to systematically design new graphs.
Think of a graph or a data visualization as a mapping…
…FROM variables in the data set (or statistics computed from the data)…
…TO visual attributes (or “aesthetics”) of marks (or “geometric elements”) on the page/screen.
ggplot2: elegant graphics for data analysis by Hadley Wickham
The grammar makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on specific chart types.
data: dataframe containing variablesaes : aesthetic mappings (position, color, symbol, …)geom : geometric element (point, line, bar, box, …)stat : statistical variable transformation (identity, count, linear model, quantile, …)scale : scale transformation (log scale, color mapping, axes tick breaks, …)coord : Cartesian, polar, map projection, …facet : divide into subplots using a categorical variableggplot2Complete this template to build a basic graphic:
+ to add layers to a graphic.We map variables (columns) from the data to aesthetics on the graphic useing the aes() function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
We map variables (columns) from the data to aesthetics on the graphic useing the aes() function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
Global Aesthetics
Local Aesthetics
Wee use a geom_xxx() function to represent data points.
one variable
geom_density()geom_dotplot()geom_histogram()geom_boxplot()two variable
geom_point()geom_line()geom_density_2d()three variable
geom_contour()geom_raster()Not an exhaustive list – see ggplot2 cheat sheet.
To create a specific type of graphic, we will combine aesthetics and geometric objects.
Let’s try it!
Start with the TX housing data.
Make a plot of median house price over time (including both individual data points and a smoothed trend line ), distinguishing between different cities .
statA stat transforms an existing variable into a new variable to plot.
identity leaves the data as is.count counts the number of observations.summary allows you to specify a desired transformation function.Sometimes these statistical transformations happen under the hood when we call a geom.
statExtracts subsets of data and places them in side-by-side graphics.
facet_grid(. ~ b): facet into columns based on bfacet_grid(a ~ .): facet into rows based on afacet_grid(a ~ b): facet into both rows and columnsfacet_wrap( ~ b): wrap facets into a rectangular layoutYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = ______)
"free" – both x- and y-axis limits adjust to individual facets"free_x" – only x-axis limits adjust"free_y" – only y-axis limits adjustYou can set a labeller to adjust facet labels:
facet_grid(. ~ fl, labeller = label_both)facet_grid(. ~ fl, labeller = label_bquote(alpha ^ .(x)))facet_grid(. ~ fl, labeller = label_parsed)Position adjustments determine how to arrange geom’s that would otherwise occupy the same space.
position = 'dodge': Arrange elements side by side.position = 'fill': Stack elements on top of one another + normalize height.position = 'stack': Stack elements on top of one another.position = 'jitter": Add random noise to X & Y position of each element to avoid overplotting (see geom_jitter()). It is good practice to put each geom and aes on a new line.
How would you make this plot from the diamonds dataset in ggplot2?
dataaesgeomfacetThere are a lot of pieces to put together when creating a good graphic.
This game plan should include:
geom do you need?aes’s do you need?Use the mpg dataset to create two side-by-side scatterplots of city MPG vs. highway MPG where the points are colored by the drive type (drv). The two ploits should be separated by year.
Artwork by Allison Horst
Today we will…
Graphics consist of:
Structure: boxplot, scatterplot, etc.
Aesthetics: features such as color, shape, and size that map other variables to structural features.
Both the structure and aesthetics should help viewers interpret the information.
Edward R. Tufte is a well-known critic of visualizations, and his definition of graphical excellence consists of:
When creating graphics, we need to think carefully about how we make structure and aesthetic decisions.
Our brains have an amazing ability to create and perceive structure among visual objects.
Objects with the same visual properties are assumed to be similar and are grouped together.
Use design elements such as shape and color to indicate groupings of the data.
Objects that are close together are perceived as a group.
Since physical distance connotes similarity, grouping bars on a chart can indicate similarities among their data.
Elements that are aligned (on the same line, curve, or plane) are perceived to be more closely related to each other than to other elements.
It is often easier for us to perceive the groupings if the shapes are curves, rather than lines with sharp edges.
Objects that appear to have a boundary around them are perceived as being related.
Objects that are connected, such as by a line, are perceived as a group.
Complex arrangements of visual elements are perceived as a single, recognizable pattern.
Objects are perceived as either standing out prominently in the foreground of an image or receding into the background.
Whatever stands out visually is perceived as the most important. It will grab our attention first and hold it for the longest.
| Gestalt Hierarchy | Graphical Feature |
|---|---|
| 1. Enclosure | Facets |
| 2. Connection | Lines |
| 3. Proximitiy | White Space |
| 4. Similarity | Color/Shape |
Implications for practice:
The next slide will have one point that is not like the others.
Raise your hand when you notice it.
Pre-attentive features are features that we see and perceive before we even think about it.
They will jump out at us in less than 250 ms.
E.g., color, form, movement, spatial location.
There is a hierarchy of features:
Usually no more than 7 colors:
Can use colorRampPalette() from the RColorBrewer package to produce larger palettes by interpolating existing ones
Use color gradient with only one hue for positive values:
Use color gradient with two hues for positive and negative values. Gradient should go through a light, neutral color (white).
There are several packages with color scheme options:
These packages have color palettes hthatare aesthetically pleasing and, in many cases, colorblind friendly.
You can also take a look at other ways to find nice color palettes.